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Abstract: For Bayesian D-optimal design, we define a singular prior distribution for 
the model parameters as a prior distribution such that the determinant of the Fisher 
information matrix has a prior geometric mean of zero for all designs. For such a 
prior distribution, the Bayesian D-optimality criterion fails to select a design. For 
the exponential decay model, we characterize singularity of the prior distribution 
in terms of the expectations of a few elementary transformations of the parameter. 
For a compartmental model and several multi-parameter generalized linear models, 
we establish sufficient conditions for singularity of a prior distribution. For the 
generalized linear models we also obtain sufficient conditions for non-singularity. 
In the existing literature, weakly informative prior distributions are commonly rec¬ 
ommended as a default choice for inference in logistic regression. Here it is shown 
that some of the recommended prior distributions are singular, and hence should 
not be used for Bayesian D-optimal design. Additionally, methods are developed 
to derive and assess Bayesian D-efRcient designs when numerical evaluation of the 
objective function fails due to ill-conditioning, as often occurs for heavy-tailed prior 
distributions. These numerical methods are illustrated for logistic regression. 

Key words and phrases: Compartmental model, exponential decay model, general¬ 
ized linear model, ill-conditioning, logistic regression. 


1. Introduction 

In recent years, much effort has been devoted to the development of D- 
optimal design methods for nonlinear problems; for example, nonlinear models 


(e.g. Yang and Stufken (2009, 2012); Yang (2010)), generalized linear models 


(Khuri, Mukherjee, Sinha and Ghosh 

(2006) 

; Woods, Lewis, Eccleston and Rus- 

sell 

(2006) 

Yang, Zhang and Huang ( 

2011 

)U 

Lang and Mandal 

(2015|)), and linear 


models with mixed effects (Jones and Goos (2009)). In each of these areas, the 
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choice of a D-optimal design depends on the unknown vector of model parame¬ 
ters, 0 G 0 C RP. 

One approach to choosing a design is to make a ‘best guess’ of the parameter 
values, and calculate a corresponding locally D-optimal design (Chernoff, [1953 ), 
i.e. Q G argmax^g^ \ M(^;9)\, where M(^; 9) is the Fisher information matrix for 
design ^ G 'B, and H is the set of all competing designs. However, the performance 
of a locally optimal design may be highly sensitive to misspecification of the value 
of 9. Then a Bayesian approach is often used to derive designs that are efficient for 
a variety of plausible values for 9. This approach requires the adoption of a prior 
distribution, V, on the parameters, and maximization of the value of an objective 
function that quantifies the expected information contained in the experiment. 
Throughout, we assume that "P is a probability measure on the measure space 
(0, S), with S the Borel u-algebra over 0. A widely used objective function is 


^(e;P)= [ log\M{t9)\dV{9), 

Je 


( 1 . 1 ) 


see, for example, Chaloner and Larntz (1989) and Gotwalt, Jones and Steinberg 


(|2009|). We adopt the measure-theoretic formulation of integration, under which 

is a non-negative S- 
^ is 


the notation g{9) dV{9) = oo is standard when g : 0 


measurable function (Capinski and Kopp (2004), pp. 77-8). When g : 0 — 
a general S-measurable function, it is said that Jq g{9) dV{9) = —oo if and only 
if Jq 9~^ dV {9) < oo and jQg~{0)dV{9) = oo, where g~^{9) = max{O,5f(0)} 
and g~{9) = max{0, —g{9)}. 


A design that maximizes (1.1) is said to be (pseudo-)Bayesian D-optimal, 


and may be used whether or not a Bayesian analysis will be performed (e.g. Woods 


Lewis, Eccleston and Russell (2006)). Maximization of (1.1) is equivalent to max¬ 


imization of an asymptotic approximation to the Shannon information gain from 


prior to posterior (Chaloner and Verdinelli (1995)). 

In nonlinear problems, a singular parameter veetor is a 0 such that M(^; 9) 
has determinant zero for any design ^ G H. For such 9, it is difficult to estimate 
the parameters no matter which design is used, often because of a lack of model 


identifiability (see Section 2.3). In this situation, the local H-optimality criterion 


fails to select a design. The analogue of a singular parameter vector for Bayesian 
P-optimality is defined through: 
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(a) 

(b) 


Given ^ G H and a prior distribution, "P, we say that ^ is a Bayesian singular 
design with respect to V if = —oo. 

Given a prior distribution, V, we say that P is a singular prior distribution 
if all ^ £ E are Bayesian singular with respect to P, or equivalently if 
the geometric mean of \M{^]9)\ under P is zero for all ^ G H. Above, the 
geometric mean of a non-negative random variable X is defined as E^{X) = 
exp[P{log(A)}], with E^{X) = 0 if Plog(A) = —oo ( |Feng et ^ ( |2017 )). 


For a singular prior distribution P, Bayesian P-optimality cannot be used to 
select a design, since all designs have the same objective function value 0(^;P) = 
—oo. In many models, such as the exponential decay model and logistic regres¬ 
sion, it is straightforward to detect singular parameter vectors, 9, by inspection 
of the information matrix. However, as shown below, it is more difficult to detect 
whether P is a singular prior distribution, except in the case of point priors. 

A different, but related, problem is the presence of ill-conditioned informa¬ 
tion matrices in a quadrature approximation to ([Tg. For several models, this 
is likely to occur for a heavy-tailed prior distribution P, even if P is theoreti¬ 
cally non-singular. Such ill-conditioning causes failure of numerical selection of 
Bayesian P-optimal designs. 

In this paper, we clarify and extend the set of prior distributions for which 
Bayesian P-optimal design is feasible for three important classes of models. In 


Sections |2.1 2.2, and 2.3, respectively, we give examples of singular prior distri¬ 
butions for the one-factor exponential decay model, a three-parameter compart- 


mental model, and several multi-factor generalized linear models. In Section 2.3 


the default weakly informative prior proposed for logistic regression by [Gelman, 


Jakulin, Pittau and Su (2008) is shown to be singular. For the exponential and 


generalized linear models, sufficient conditions for a prior distribution to be non¬ 
singular are established. These conditions are easily checked to determine if the 
Bayesian P-optimality criterion can be used to select designs under P. In Sec¬ 
tion novel methods are developed that enable the selection of highly Bayesian 
P-efficient designs for logistic regression when the quadrature approximation to 


(I.l) is ill-conditioned, thereby facilitating design for heavy-tailed prior distribu¬ 
tions. Finally, in Section]^ we discuss alternative approaches to finding efficient 
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designs when "P is a singular prior distribution. 


2. Singularity of prior distributions for some standard models 
2.1. Exponential decay model 

We derive necessary and sufficient conditions for a prior distribution to be 
singular for the exponential decay model which is used, for example, to model 
the concentration of a chemical compound over time. This model is commonly 
used as a simple illustrative example of a nonlinear model in the optimal design 


of experiments literature, e.g. Dette and Neugebauer (1997); Atkinson (2003); 
the results here help to develop our intuition. Here, two parameterizations are 
considered: by rate, /? > 0, and by ‘lifetime’, 0 = 1//3 > 0. For the former, the 
model for the response, y, in terms of explanatory variable, x > 0, is 


Vi = e 


+ Ci 


ei ~ iV(0,cj ), 


where i = 1,... ,n, Xi > 0, and a > 0. 

Assume that H = A’*, where A = [0, oo). Then design ^ = (xi,..., x„) G H 
has information matrix 

n 

i=l 

Suppose that at least one Xj > 0 and let Sxx = Then 


- 213 max {x*} < log |Ma(^;/3)| -log^^,!, < -2/3 min {xj . (2.1) 

z=l,...,n i:Xi>Q 

By taking expectations, the following result is obtained. 


Proposition 1. Suppose that at least one Xj > 0. Then, for the (3-parameterization, 
(/>(^;P) > —oo if and only if E'p{l3) < oo. 


Here the prior, V, is non-singular provided the rate parameter has finite 
expectation, but V can be singular if the distribution of /3 is heavy-tailed with 


infinite mean, e.g. if (3 is half-Cauchy (cf. [Poison and Scott (2012)). 

For the 0-parameterization, a change-of-variable argument shows that 


log|M,(^;0)| =log|M;3(e;/3)|-41og0. (2.2) 

This enables derivation of the following result; for proof see the supplementary 
material. 
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Proposition 2. For the 9-parameterization, the prior distribution V is singular 
if and only if either E'p{l/9) = oo or Ep{log9) = oo. 


In the context of designs that maximize V) for nonlinear models, Chalone: 


and Verdinelli (1995) refer to potential ‘technical problems using prior distribu¬ 
tions with unbounded support where [...] M(^;0) may be arbitrarily close to 
being singular’. Corollarybelow shows that, even with bounded support, seem¬ 
ingly innocuous prior distributions can cause Bayesian H-optimality to fail as a 
design selection criterion. 


Corollary 1. For the 9-parameterization, the prior distribution V = U{0,a), 
a > 0, is singular. 

Note that under the prior for 9 in Corollaryj^ the corresponding implied prior 
distribution for fi has a proper density, p(/3) = l/(a/3^) for /3 > 1/a. However, 
this implied distribution for /3 has unbounded support and is heavy tailed, such 
that E{I3) = oo. In other words, the implied a priori expectation is that the 
decay is very rapid. 


2.2. Compartmental model 

In this section, we derive sufficient conditions for a prior distribution to be 
singular for the following three-parameter compartmental model: 

Vi = + e,, e, ~ iV(0, , (2.3) 


where x* > 0, i = 1,..., n, 6*2 > > 0, 03 > 0 and cr > 0. Here, E = [0,oo)"'. 

As with the exponential model, often the response y* is a concentration of a 
compound in a system, and the Xi are the observation times. For example. 


Atkinson et al. (1993) consider a theophylline kinetics experiment on horses. 


finding optimal sampling times for model (2.3) under several different (pseudo- 
) Bayesian criteria. 

The information matrix for the ith time point is 


M{xp,9) = 


( xl9\e 




fiXie 


-02Xi 


_ f •'y • ^ 

Jz-^Z^ 

fl/Ol 
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where fi = We have \M{^;9)\ = 0 when (i) 9i = 02 or (ii) 

03 = 0, and |M(^; 0)| —)■ 0 when (in) 0i —)■ oo. Physically, conditions (i) and (hi) 
correspond to situations where the flow rates in and out of the compartment are 
either exactly balanced, or both very rapid. Each of the potentially very different 
parameter scenarios in (i)-(iii) results in a similar response profile, in which the 
concentration is close to zero throughout the duration of the experiment. Thus, if 
such a profile is observed, it is difficult to ascertain which values of the parameters 
generated the data. 

From the above, it is clear that for P to be a non-singular prior distribution 
its probability density must not be too highly concentrated near regions where 
02 = 01 or 03 = 0, nor can the prior for 0i be too heavy-tailed. This is formalized 
by Proposition for which the following two lemmas are required; proofs are 
given in the supplementary material. Let 5 = 02 — 0i > 0. 

Lemma 1. We have the following bounds on log |M(^;0)|, 


-60iXmax < log |M(^;0)| - 4 log 03 - log|M 5 ,i| < -60ia:min , 
where Ms^i is Ms^e^ evaluated at 9^ = 1, and 

n 

-^<5,03 = ^ ^ ’ ^mii 

i=l 


( 


= 




-xple 


-Sxi 


min {xj , 

iCmax = max {Xi} , 

i : 

2 =1,...,n 


—Xj 03 (l — e~^^' 

x‘f9^e~‘^^^* 

Xi9^e~^^^{l — e~^ 

e-Sxi^l _ g- 



Lemma 2. If dV{9) = —oo, then Ep(log jM^^il) = —oo. 


Proposition 3. Suppose log 03 dP(0) < oo. For model (2.3), the prior 
parameter distribution V is singular if E-p{9i) = oo, log 03 (iP(0) = —oo, 

or log 6 dP(9) = —oo. 

Heavy-tailed priors such as the half-Cauchy are increasingly recommended 


as weakly informative priors in various models (Gelman et al. (2008); Poison and 


Scott| ( 2012| )). For model (2.3), V is singular if 0i is half-Cauchy distributed. 


although for physiological compartmental models more specific prior information 


is often used (Gelman et al. (1996)). 
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2.3. Generalized linear models 

Suppose there are n design points Xi = {xn, ..., Xig)'^ G X, with responses y*, 


i = 1,..., n. We assume a generalized linear model (GLM; McCullagh and Nelder 


( 1989| )), thus Hi has an exponential family distribution with mean /?) 

and variance 7 u(/ij), where /r satisfies 


h[n{x; /3)] = r/(x; = /"^(x)/? , 


(2.4) 


with h the link function, 7 a dispersion parameter, v the variance function, and 
rji = r]{xi;/3) the linear predictor. For binomial and Poisson responses, 7 = 1 
with variance function u(/x) = /i(l —y) and u(y) = y, respectively. Above, f(x) = 


(/o(x),..., fp-i{x))'^ contains regression functions fj : X 


j = 0,...,p-l, 


and /? = (/3o, /3i,..., G 0 is a vector of p regression parameters. We let 

A = [-I,!]*? and H = A”. 

For design ^ = (xi. 


.. ,Xn} and model (2.4) 

n 

= ^R^i/(xi)/^(xi) 


w{p) = 


2 = 1 

1 f dp 

-fv{p) \d7] 


(2.5) 


with Wi = w{r]i), i = l,...,n (e.g. [Khuri et al. (2006), Atkinson and Woods 
(2015), Yang and Mandal ( 2015[ )). 


The following lemmas are important first steps towards the derivation of 
results on singular prior distributions. Lemma also facilitates the development 
of numerical methods to overcome ill-conditioning in Section The proofs are 
straightforward; the details are omitted. Let F be the model matrix with rows 
f'^ixi), noting that Y17=i ~ F'^F is the information matrix of ^ 

under a linear model with regressors specified by /. The inequality below is 
with respect to the Loewner partial ordering on real symmetric matrices, in 
which Ml ^ M 2 if and only if M 2 — Mi is non-negative definite (for example. 


Pukelsheim (1993, p.ll)). 


Lemma 3. For a generalized linear model, the information matrix satisfies 


min {wi}F^F M(^;/3) ^ max {wi}F^F. 

2=1,...,n 2=1,...,72 
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Thus, since the log-determinant respects the Loewner ordering, 

plogminlwj} + log <\og\M{^]/3)\ < plogmaxjtdj} + log . 

i i 


Lemma 4. Suppose that ^ is non-singular for the linear model with regressors 
given by f, that is > 0. Then we have the following: 

(i) If E-p {log mini Wi} > —oo, then > —oo, i.e. ^ is Bayesian non¬ 

singular with respect to V under the GLM. 

(ii) If Ep{\ogmax.iWi} = —oo, then = —oo, i.e. ^ is Bayesian singular 

with respect to V under the GLM. 


Lemma can often be used to identify clear conditions on the prior distribu¬ 
tion that lead to singularity (or non-singularity). However, to do so it is necessary 
to analyse the tail behaviour of the GLM weight function, w{ri), as \rj\ —)• oo in 
order to establish whether (i) or (ii) above holds. Thus, the results depend upon 


which link function is chosen. In the remainder of Section 2.3 results are given 
for logistic, probit and Poisson regression. 

2.3.1. Logistic regression 

For logistic regression, yj | /3 ~ Bernoulli(7rj), where tt* = Pr(yj = 11 /5) = /x(xj; /3). 
The link function is the logit, h{Tr) = log{7r/(l — vr)}, and 


u;(y) = exp(—lyl) expit(|r/|)^ (2.6) 

~ exp(—lyl) as |r/| —oo . 

Above, expit(y) = 1/{1 -|- e~^}. Lemma is now used to establish sufficient 
conditions for the prior distribution to be non-singular for logistic regression. 


Theorem 1. Suppose that V is such that Ep{\j3j\) < oo, for j = 0, ... ,p — l. Iff, 
is non-singular for the linear model with regressors given by f, that is > 0, 

then > —oo, i.e. f is also Bayesian non-singular with respect to V for 

the logistic model. 


Note that there is no requirement for V to have bounded support. In partic¬ 
ular, this result provides theoretical reassurance that Bayesian H-optimality can 
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be used to select a design with a normal or log-normal prior on the parameters. 
There is also no requirement for the parameters to be independent a priori. For 
example, the result applies to a normal-mixture hierarchical variable selection 
prior distribution (Chipman et al. ( 1997[ )). 

Other important prior distributions do not satisfy the conditions of Theorem 


1; for example that proposed by Gelman, Jakulin, Pittau and Su (2008) which we 
denote by Vg- These authors recommended rescaling before fitting the model. 
For observational studies, each explanatory variable is transformed to have mean 
zero and a standard deviation of 1/2. This ensures that the method reflects 
the widely-held default prior belief that higher order interactions are likely to 
make a smaller contribution to the linear predictor. The combination of Vg and 
this scaling was shown to give improved predictive performance relative to both 
maximum likelihood and penalized logistic regression. An analogue of the above 
method for designed experiments is to combine Vg with a standardization of the 
design variables to have range [—1/2,1/2]. This achieves a similar penalization 
of higher order interactions. 

It is possible to obtain a partial inverse result to Theorem 
Proposition 4. Given j G {0, ... ,p — 1}, suppose that: 


(i) V is such that Pr(/3j > 1) > 0 

(ii) V is such that, for all (5 > 0, 

Pr(|/3fc| < 5 for all k ^ j \ f3j > 1) > 0 
(hi) V is such that, for all h > 0, 

EvWj I /3i > 1, |/3fc| < 5, for all k ^ j] = oo 
(iv) ^ is such that minj=i^...^„ \fjixi)\ > 0. 

Then ^ is Bayesian singular with respect to V, i.e. 4>{f,]V) = —oo. 


A more intuitive understanding of the reason that the above conditions lead 
to a singular prior distribution can be obtained by considering locally optimal 
design. There, we have that |M(^;/1)| ~ 0 if the responses are close to determin¬ 
istic, i.e. if for all design points the success probability Vv{yi = 11 /I) is close to 


either 0 or 1. In that case, there is also a high probability of separation (Albert 
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and Anderson ( 1984|)) and thus non-existence of maximum likelihood estimates. 


For Bayesian design, a heavy-tailed prior satisfying the conditions of Proposition 
1^ leads to similarly extreme values of the success probability, which is now a 
random variable owing to dependence on j3. Specifically, the implied distribution 
on Pr(i/j = 11 /3) has high concentration near either 0 or 1, in the following sense: 


Proposition 5. Under the conditions in Proposition there exists an event 
C 0, with Pr(£l) > 0, conditional upon which either Pr(?/i = 11/3) or 1 — 
Pr(yj = 11 /3) has prior geometric mean zero, according to whether fj{xi) < 0 or 
fj{xi) > 0 respectively. 

The proofs of Propositions]^ andboth rest on the identification of a region, 
£, of parameter space where the linear predictor r]i can be approximated by the 
contribution, f3jfj{xi), from the jth predictor. 

The Gelman prior distribution, Vgi places independent standard Cauchy 
distributions on (l/10)/3o, (2/5)/3i,..., (2/5)/3p_i. Thus, the prior distributions 
for the regression coefficients are heavy-tailed, with undefined prior mean. The 
parameters are expected a priori to have large magnitude, i.e. E\f3k\ = oo, k = 
0,... ,p — 1. For a model with an intercept term, fo{x) = 1, and Proposition 
1^ may be applied with j = 0; conditions (ii) and (hi) follow since /3o is both 
heavy-tailed and independent of the other parameters, hence: 


Corollary 2. For a logistic model with an intercept term, the prior distribution 
Vg is singular. 


Often prior independence of parameters is not a reasonable assumption. For 


example, Chipman et al. (1997) define a hierarchical variable selection prior 
in which the probability of an interaction term being active is dependent on 
whether the parent terms are active, thereby satisfying the weak heredity princi¬ 
ple. Proposition can be used to show that, for logistic regression, a prior with 
this hierarchical structure is singular if the prior distribution of the intercept pa¬ 
rameter is a mixture of two scaled zero-mode Cauchy distributions rather than a 
mixture of two scaled zero-mean normal distributions. In this case, the intercept 
is again both heavy-tailed and (typically) independent of the other parameters. 

For logistic models with a single controllable variable, scalar x G A, A = M, 
Bayesian H-optimal design has also been studied for a different parameterization 
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(for example, Chaloner and Larntz (1989)): 

h{7ri) = f3i{xi - n) 


(2.7) 


which can be obtained from (2.4) via /3o = When /3i = 0, /r is not 

identifiable and \M 0 {^] 6 )\ = 0 for all ^ G H, with 9 = The 

following result, which is straightforward to prove using Theorem provides 
sufficient conditions for a prior distribution to be non-singular for this form of 
the model. 


Proposition 6. For the (n, Pi)-parameterization in (2.7), if (i) E'p{\plPi\) < oo, 
(a) E'p{\Pi\) < oo and (Hi) E-p 0 og\Pi\) > —oo, then any design with two or 
more support points is Bayesian non-singular with respect to V. Hence (i)-(iii) 
are sufficient for V to be non-singular. In this case, f is Bayesian D-optimal for 
iPo, Pi) if and only if it is Bayesian D-optimal for (//, Pi). 

2.3.2. Probit regression 

For probit regression, y* |/? ~ Bernoulli( tt*), TTj = fj,{xi;P), with link h{ 7 r) = 
<h“^(7r), where is the standard normal c.d.f.. Here, 

_ Fih)^ 

4.(,)(1 - 4 .(,)) • 

where (p{r]) = standard normal p.d.f.. The following asymptotic 

approximation holds (e.g. Abramowitz and Stegun (1964), p.298) 


1 — <h(r/) 


1 




as r] 


Also, as 7 ] 


r]\/^ 

oo, 4>(r/) —>• 1, and so by symmetry of w{rj) 
1 


oo. 


w{r]) 




\r]\e 


-rpli 


as \r]\ 


oo. 


( 2 . 8 ) 


This asymptotic approximation can be used with Lemma to obtain ana¬ 
logues of the results for logistic regression, with different conditions on the prior 
distribution. 


Theorem 2. If E-plP^Pi] < oo, for A:, Z = 0,1,... — 1, then V is non-singular 

for the probit regression model. 

Proposition 7. Given j G {0, ... ,p — 1}, suppose that: 
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(i) V is such that Pr(/3j > 1) > 0 and, for all 6 > 0, 

Pr(|/3fc| < 6 for all k ^ j \/3j > 1) > 0 

(ii) V is such that, for all <5 > 0, 

E'p[\Pj\‘^ I /3j > 1, \/3k\ < 6 for all k ^ j] = oo 

(iii) ^ is such that mirij \fj{xi)\ >0. 

Then, for the probit link the design ^ is Bayesian singular with respect to V, i.e. 
Ev\og\M{i-p)\ = -<x. 

Corollary 3. For a prohit model with an intercept term, the prior distribution 
Vg is singular. 

Again, a heavy-tailed prior on the intercept parameter results in the prior 
being singular for Bayesian D-optimality. The intuitive interpretation is similar 
to that for the logistic model. Note that Vg would remain singular even if it were 
made somewhat less heavy-tailed, for example by replacing the Cauchy prior on 
/3o with a t{2) prior. In this case, condition (ii) above will still hold because /3o 
has infinite variance. 


2.3.3. Poisson regression 

Consider the model i/i | /3 ~ Poisson(Aj), with pLi = Aj and /i(/r) = log/r. Optimal 


designs for this model were considered by Russell et al. (2009) and McGree and 


Eccleston (2012). Here, w{r]) = exp(r/) and we have the following results. 


Theorem 3. For the Poisson regression model with log link, if E'p\l5k\ < oo, 
k = 0,... ,p — 1, and > 0 then Ep log |M(^; /3)| > —oo. Hence, if the first 

moments for fdk are finite then V is non-singular. 


Proposition 8. Given j G {0 ,... ,p — 1}, suppose that: 

(i) V is such that fdj is supported on (— oo, 0) 

(ii) V is such that Pr(/Sj < —1) > 0 and, for all d > 0, 


Pr(|/3fc| < 6 for all kf^j\fij< -1) > 0 
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(iii) V is such that for all (5 > 0, 

Ev[l3j I Pj < -1, |/3fc| < (5 for all k j] = -oo 

(iv) V is such that E'p\j3k\ < oo, k ^ j 

(v) ^ is such that fj{xi) > 0 for i = 1 ,... ,n. 

Then E-p log\M/3)\ = —oo, i.e. f, is singular for the Poisson model with log 
link under Bayesian D-optimality. 

Corollary 4. For a Poisson model with log link containing an intercept, i.e. 
/o(x) = 1, if /So has a negative half-Cauchy prior independently of j3k, k = 
1,... ,p — 1, with Eplfdkl < oo, then V is singular. 

Here, a heavy-tailed negative intercept parameter can result in a singular 
prior. Intuitively, it is clear that large negative values of /3q will lead to ex¬ 
periments where most of the responses are zero, leading to difficulties obtaining 
precise estimates of /So and the other parameters. 


3. Numerical methods to overcome ill-conditioning 
3.1. Objective function approximation 

In performing a numerical search for Bayesian H-optimal designs it is nec¬ 
essary to approximate the objective function, usually via a weighted sum, 

Nq 

r) « cfiC; Q) = '£vi log |M(e; /3«)|, (3.1) 

i=i 


over a weighted sample. 



^( 1 ) 

Vl 


^(Na) I 

VNq J ’ 


of parameter vectors, I = 1,..., Nq, with corresponding integration weights 
Vl, satisfying vi = 1 . 

The sample Q may be obtained, for example, by space-filling criteria, as 
used by Woods, Lewis, Eccleston and Russell (2006), Latin hypercube sampling. 


or a quadrature scheme, such as that applied by jGotwalt, Jones and Steinberg 


(2009). Quadrature methods, and in particular the Gotwalt method, can often 


yield highly accurate approximations. 
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A problem with approximation (3.1) is that for multi-parameter models nu¬ 


merical evaluation of Q) can fail due to the presence of ill-conditioned ma¬ 
trices M(^; whose determinant will be estimated numerically as zero. Note 
this can occur even for non-singular V; for singular V there is little point in eval¬ 
uating Q) since 4>{C','P) = —oo. When numerical evaluation of Q) fails 
for all ^ G H, we say that Q is an ill-conditioned quadrature scheme. In princi¬ 
ple, Q can be ill-conditioned for any prior distribution. However, for the models 
considered here, such as logistic regression, ill-conditioning of Q is more likely 
if the underlying prior distribution is heavy-tailed. In that case, there is high 
probability of large /3, and so also of M(^; (3) being ill-conditioned. For any prior, 
even without heavy tails, other circumstances that may lead to ill-conditioning 
of Q include: (i) use of a quadrature scheme, such as the Gotwalt method, which 
oversamples the tails of V; and (ii) use of a large number of quadrature points. 
In most integration problems, an increased number of quadrature points leads 
to improved approximation of the integral; paradoxically, in Bayesian D-optimal 
design this may cause numerical evaluation to fail due to ill-conditioning. 


3.2. Objective function bounds for logistic regression 

For some important models, it is possible to obtain bounds that allow approx¬ 
imation of Q) when Q is ill-conditioned, as often occurs for heavy-tailed pri¬ 
ors. These bounds may be applied to enable straightforward selection of Bayesian 


ZD-efficient designs for such priors (see Section 3.3). Here we focus on the case 
of logistic regression, but a similar approach can be used for the compartmental 
model (using Lemma[^, and other GLMs. From Lemma|^and ( |2.6[ ), we see that 
(/>(^;/3) = log|M(^;/3)| lies in 1^)], where 


P) =F\ + p min {-|7yj|-Z 21ogexpit |r?i|} 

4 >u{C',P) = '^os\F'^F\+P max {-|7yj|-Z 21ogexpit |r?i|} . 

Let S be the set of / G {1,..., Nq} for which M{^] is ill-conditioned, then; 


Q) < (kii] Q) < 0 {/(^; Q) 


( 3 . 2 ) 
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where 


E VI log|F'’"F| 

ie{i,...,NQ}\s i(^s 

+ '^vip min {-\f'^{xi)l3^’‘'>\ + 21ogexpit |/'^(xi)/3^'^|} 


les 


2 = 1 , 


0u(^;Q)= E ^^«log|M(^;/3«)| + E vi log|F'’"F| 

1&{1,...,Nq}\S 


les 


+ '^vip max {-|/'^(xi)/3(')| + 21ogexpit |/^(xi)/3(')|} . 

2=l,...,n 

/GcS 

The bounds Q), 4>ui^] Q) are much better conditioned than Q). 

The bounds for log |M (^;I G S, are often wide. However, as the corre¬ 


sponding vi is often very small, we may nonetheless obtain from (3.2) a relatively 


narrow interval for Q). Note that (3.2) specihes an interval that contains 


the approximation Q), and not necessarily the value of 4 >{C]'P)- 

In the remainder of Section we use the following example to show how the 
bounds enable an extension of the set of prior distributions for which Bayesian 
L>-efficient designs can be obtained in practice. We begin by illustrating the use 
of bounds for the objective function. 


Example 1. Potato-packing experiment (Woods, Lewis, Eccleston and Russell 


( 2006[ >). We use one of the authors’ models, defined by 

f{x) = (l,xi,X2,X3,xiX2,xia:3, 0 : 23 ^ 3 )'^ 

= iPo, l^l, P2, Ps, 1 ^ 12 , 1 ^ 13 , ^23)'^ , 

where q = 3, x = (xi, X 2 , X 3 )"'". We adopt a different prior distribution, namely 
log/3o ~ N{—1, 2), /3i ~ N{2, 2), /32 ~ A^(l, 2), ~ N{—1, 2), and / 3 i 2 , /^is, /323 ~ 

A^(0.5, 2) independently. Note that the log-normal prior for the intercept param¬ 
eter is heavy-tailed. However, from Theorem[^ the above joint prior distribution 
is non-singular. 

For a double-replicate of the 2^ full factorial design, the value of (f){C','P) 
was approximated using the Gotwalt quadrature scheme, with 5 radial points 
and 4 random rotations. Direct numerical evaluation of Q) failed, since 


S was non-empty: it contained 39 parameter vectors. However, from (3.2), 
Q) G [-6.85,-6.78]. 
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3.3. Use of bounds in design optimization and assessment 


The bounds from (3.2) may also be used within an optimization algorithm 
to help find Bayesian U-efficient designs. The Bayesian D-efficiency of ^ is 


Bayes-eff(C;P) = exp{[(?i(^;P) - 4’{Cr;'P)]/p} x 100%, 

where G argmax^g= (^(^; "P) is a Bayesian U-optimal design. Bayesian D- 
efficiencies near 100% indicate that ^ achieves a near-optimal trade-off in perfor¬ 
mance across the support of the prior distribution for (5. 

When Q is well-conditioned, the Bayesian U-efficiency may be approximated 
by numerical search for a G arg max^g^ Q) that maximizes the quadrature 
approximated objective function, and substitution of the design found into 


Bayes-eff(^; Q) = exp{[(/>(^; Q) - Q)]/p} x 100% . 

However, if Q is ill-conditioned, for example if V is heavy-tailed, then this method 
fails since (i) 4 >{^; Q) cannot be evaluated directly, and (ii) ^g cannot be found 
using a numerical search. We may nonetheless use numerical methods to find 
designs .^g ^ and ^g jj that maximize the lower and upper bounds respectively, 
i.e. ^g p G argmax^g;^ Q) and ^g p G argmax^g= 0f/(^; Q). Then a lower 

bound for the Bayesian efficiency of .^g ^ can be approximated, via substitution 
of the designs found into 


Bayes-eff(^Q^i; Q) > exp{[</)!, Q) - Q)]/p} x 100% . (3.3) 


To find exact designs that maximize the bounds, we use a continuous co-ordinate 


exchange algorithm similar to that of Gotwalt, Jones and Steinberg (2009) 


Example 1 (continued). A co-ordinate exchange algorithm was used, with 100 
random starts, to search for ^g ^g among exact designs with n = 16 runs. 
The quadrature scheme Q was generated using the Gotwalt method, with 3 
radial points and one random rotation, yielding a total of 217 support points for 
Q. The design given in Table 3.1, is very similar to Cqu'- 1° ^ *1-P- 


two are identical. For this Q, the objective function (/)(^; Q) cannot be computed 
exactly due to ill-conditioning. Thus, given an alternative design e.g. a 16-run 
combination of ^g p and Cq [/ > ^1 possible to evaluate whether has higher 
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Run 

Xi 

X2 

Xi 

Run 

Xl 

X2 

Xi 

1 

0.456 

1.000 

1.000 

9 

-1.000 

-1.000 

1.000 

2 

-1.000 

-1.000 

-1.000 

10 

-0.269 

1.000 

1.000 

3 

-1.000 

0.512 

-1.000 

11 

1.000 

-1.000 

-1.000 

4 

-0.137 

-1.000 

-1.000 

12 

1.000 

-1.000 

0.045 

5 

1.000 

-1.000 

1.000 

13 

-1.000 

-1.000 

-0.124 

6 

1.000 

1.000 

-1.000 

14 

0.085 

-1.000 

1.000 

7 

1.000 

-0.038 

1.000 

15 

-1.000 

1.000 

-0.213 

8 

-1.000 

1.000 

1.000 

16 

-0.149 

1.000 

-1.000 

Table 3.1: Example Bayesian design, Cq.L) 

that maximizes 

the lower bound 


Bayesian D-efRciency than However, the lower bound on the Bayesian D- 

efficiency is Bayes-eff(.^g Q) > 99.4%, so any improvement to be gained by 
using a different design will be very small. 


Note that the computation of the numerical value of the lower bound in (3.3) 


is approximate since we cannot be certain to have found the global optimum 
^g^;, although in the above example an assessment of the objective function 
values from the different random initializations of the algorithm suggests that 
the number of starts is adequate. 

To assess the performance of a given design, for different /?, we use the 
local L)-efficiency, 

eff(^;/3) = {|M(^;/3)|/|M(e^;/3)|}i/% (3.4) 

where G argmax^g= |M(^;/3)| is a locally H-optimal design. For some /?, 
M(^;/3) is well-conditioned for most ^ G H. In this case, the local H-efficiency 
can be approximated by searching numerically for , and substituting the design 


found into (3.4). For other /5, M(^;/3) is ill-conditioned for all ^ G 


Then 


approximate bounds on the efficiency can be derived by numerical search for the 
designs G argmax^g;^ P) and ^2 /3 ^ argmax^g;^ P), and from the 
fact that 

exp^[^L(^;/3) - 4>u{Cu,i3:P)] < < exp ^[</>{/(^;/5) - P)] ■ (3-5) 

To visualize the dependence of the local efficiency on the individual parame¬ 
ters, for each regression coefficient f3j we plot the approximate mean and 10% and 
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Figure 3.1: Robustness comparison of the Bayesian H-efficient design (black 
lines/points) versus the EW-optimal design (grey lines/points). Panels (a)-(g): con¬ 
ditional distribution, given /3fc, of the local efficiency, eff(^;/3), induced by the prior on 
j3 (solid line, conditional mean; shaded region, 10% and 90% quantiles). Panel (h): 
marginal distribution of the local efficiency. Panel (i): 2-dimensional projection of the 
design points. 
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90% quantiles of the conditional distribution of eff(^; /3) given Pj. Owing to the 
need to search for a locally ID-optimal design, evaluation of eff(^;/3) is computa¬ 
tionally intensive. Thus, before computing the conditional mean and quantiles it 
is advantageous to first build a statistical emulator of eff(^; /3) as a function of (3, 
using Gaussian process interpolation. This is analogous to the approach followed 
in the computer experiments literature when the main effects of a computation¬ 


ally expensive simulator are visualized (e.g. Santner, Williams and Notz (2003 


Ch.7)). A similar method was used by Waite and Woods (2015) to visualize the 
efficiency profile of Bayesian designs for logistic models with random effects. 

Example 1 (continued). We consider further the performance of the design, 
L, that maximizes the lower bound for Q). The support points of the 

quadrature scheme are used to train the emulator of eff(^g /3). In the example, 
only three out of the 217/3 vectors in Q led to M{^] P) being ill-conditioned for 


all ^ G H. For these vectors, the efficiency bounds in (3.5) gave no additional 
information beyond eff(.^g^;/3) G [0%,100%]. Thus we decided to omit these 
/3 vectors from the training set, as including the bounds [0%, 100%] would not 


substantively reduce uncertainty about the efficiency at these (3. Figure [3]^ shows 
approximations to the conditional mean and conditional quantiles (given /3j) of 
the local efficiency, obtained using the emulator and Monte Carlo sampling. Also 
shown is a kernel density estimate for the marginal distribution of local efficiencies 
of ^ induced by the prior distribution on f3. This is derived by computing the 
Kriging-based estimates of eff(^g /3) for a Monte Carlo sample of 10,000 /3 


vectors from the prior distribution. From Figure 3.1 it appears that the modal 


local efficiency of is in the range 55-60%. The lower and upper quartiles of 
the local efficiency distribution are approximately 46% and 62%. Although at 
first glance the typical local efficiencies may appear fairly low, it is important to 
remember that due to the large amount of prior uncertainty here, there will exist 
no design whose local efficiency is significantly higher than uniformly across 
the entire parameter space. The design achieves a near-optimal trade-off in 
performance, as quantified by the high estimated Bayesian H-efficiency obtained 
earlier, across the very different parameter scenarios that are possible under the 
prior for (3. The design is thus relatively robust. There appear to be no significant 
areas of the parameter space where the design performance is very poor and, for 
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example, the prior probability that eff(^g /3) < 0.2 appears negligible. 

For comparison, results are included for the EW-optimal design, advo¬ 
cated by Yang et al. (2016), which maximizes 

n 


V'Ew(0 = log \EMi3 {^;P)\ = log 


2 = 1 


In factorial experiments for logistic regression, Yang et al. (2016) found EW- 
optimal designs to be of comparable statistical efficiency to Bayesian H-optimal 
designs, while requiring less computational effort to obtain. Here, as shown in 
Figure [3dl a)-(h) , the EW-optimal design is much less robust than the Bayesian 
H-efficient design, at least in this case; its local efficiency eff(^gYvj 1^^® generally 
lower mean, both conditionally on f3j and marginally, and the local efficiency also 
exhibits higher variability. The smaller difference in robustness between Bayesian 
H-optimal and EW-optimal designs observed by Yang et al. (2016) may be due 
to their restriction to a factorial design space: here, the greater performance of 
the Bayesian H-efficient design appears to be due to the inclusion of a greater 
number of factor settings in the interior of (—1,1) (see Figure [3Tj (i)). 

In addition to the worse statistical performance of the EW-optimal design 
in this case, here the computational convenience of EW-optimal designs over 
Bayesian H-optimal designs is much reduced. Eor factorial problems, a reduction 
in computational cost is achieved by precomputing E[w{r])] for every point in the 
finite design space, enabling faster evaluation of V’Ew(0- Here, precomputation 
is not possible since we have continuous factors and therefore an uncountably 
infinite design space. One computational benefit of the EW criterion is that it 
successfully avoids problems with ill-conditioning, but this offers only a minor 
advantage over Bayesian H-optimality, for which ill-conditioning problems can 


now be overcome using the bounds developed in Section 3.2 


4. Discussion 

The central tenet of this paper is that it is not permissible to use a sin¬ 
gular prior distribution in conjunction with Bayesian D-optimality as a design 
selection criterion. Our new theoretical results, summarized below, can help to 
ascertain whether a prior is singular or non-singular. This is useful, since if it 
can be demonstrated that the prior is non-singular, then we may proceed to find 
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Bayesian D-optimal designs either using standard methods, or using the new 
numerical techniques developed in Section if problems are encountered with 
ill-conditioning. If it is instead demonstrated that the current prior is singular, 
then 0(^;'P) = —oo for all ^ G H, meaning that design selection fails. One rough 
intuitive interpretation of this is that the parameter uncertainty under this prior 
is so great that any design will have low (local) efficiency across a significant por¬ 
tion of the parameter space. In this case, there are two possibilities: (i) consider 
a different prior distribution, or (ii) adopt a different design selection criterion. 


These alternatives are considered in Sections 4.1 and |4.2| respectively. 

To summarize our theoretical results, for three generalized linear models we 
have given conditions that can easily be checked to establish non-singularity of V 
and, importantly, identified that a prominent class of default prior distributions 
for logistic regression should not be used for Bayesian D-optimal design. For 
the compartmental model in Section |2.2[ sufficient conditions were established 
only for singularity of V, thus highlighting only prior distributions that should 
not be used. Though desirable, the proof of an inverse result guaranteeing non¬ 
singularity seems highly involved and is beyond the scope of this paper. Future 
work could seek to develop results on singular prior distributions for population 
pharmacokinetic models, for which optimal sampling times are more commonly 


sought (e.g. Mentre et al. (1997)). Such models extend (2.3) by allowing subject- 
specific kinetic parameters. 

4.1. Alternative prior distributions 

Often there are multiple plausible candidates for a suitable prior distribution. 
For example, in the subjectivist framework, informative priors are elicited from 
expert knowledge by obtaining summaries to which a probability distribution may 


be fitted (e.g. Garthwaite et al. (2005), Oakley and O’Hagan (2007)). Typically 
there will be multiple distributions that fit the observed summaries. Away from 
this approach, if using uninformative or weakly informative priors there are still 
often multiple possible candidate priors. Thus, if design selection fails because 
V is singular but there exists an alternative candidate prior V' that is non¬ 
singular, then Bayesian i4-optimality may be used with V' instead. Nonetheless, 
if adopting a subjectivist viewpoint we should be careful to avoid selecting prior 
distributions purely for analytical convenience if they do not accurately represent 
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the available expert belief or knowledge. 


As an example, in Section 2.1 we found that P : 0 ~ C/(0, a) is singular for 


the exponential regression model. A natural question is whether it is sufficient 
to find designs for the non-singular prior "Pg : 0 ~ t/(e, a) for some small value 
of e (e.g. 10“^ or 10“®). The adequacy of as a representation of the expert’s 
beliefs will depend substantially on the specifics of the application. For small e, 
the quartiles of P and P^ are similar, thus for example it is possible for both dis¬ 


tributions to fit expert statements obtained by the bisection method (Garthwaite 


et al. (2005)). However, the implication of P^ that there is zero probability that 
0 < e is too strong unless the expert is certain that 9 > e. The fidelity of the 
representation P^ would be less important if the resulting design decision were 
insensitive to the choice of e. Unfortunately this is not the case, as shown by the 
proposition below and its proof (in the supplementary material). Intuitively, as 
e —)> 0, some points in the Bayesian P-optimal design for Ve will converge to zero 
(while never being equal to zero). 

Proposition 9. For the exponential model, if ^ does not vary with e then 


Bayes-eff(^; Pg) —)• 0 as 


0 . 


Thus, even if one were to compute the Bayesian P-optimal design for P^/, 
with say e' = 10“®, the resulting design would be highly inefficient when evaluated 
under P^ for e <C e'. 

The situation above is somewhat similar to problems in the objective Bayesian 


approach with improper uninformative priors (e.g. Berger (1985, Ch.3); Berger 


( 2006| )), which one may need to modify in order to obtain a proper posterior. 
For example, if an improper prior, say 17(10, oo), does not give a proper poste¬ 
rior, one might attempt to replace it with [7(10, M), with M large, e.g. 10® or 
10®. However, the results would often be highly sensitive to the value chosen for 
M, which is arbitrary and typically has no objective justification. For further 


discussion on the role of prior information in design of experiments, see Woods 


et al. (2016). 


4.2. Alternative selection criteria 

If all candidate prior distributions that agree with the elicited prior knowl¬ 
edge or beliefs are singular, then a Bayesian P-optimal design cannot be found 
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and it is necessary to use an alternative design selection criterion that suffers 
from fewer problems with singularities. One such criterion that has already been 


mentioned is EW ID-optimality (Yang et al. (2016)). Note that the numerical 


results in Section |3.3| suggest that if the problem is with an ill-conditioned Q 
rather than a singular "P, then the EW H-optimal design may be less robust 
than a Bayesian H-efficient design found using the numerical methods developed 
here. Another alternative is to select ^ to maximize the mean local efficiency, 

= EriemO)} , 

which is fairly insensitive to the presence of 9 with \M(^;9)\ ~ 0. This is a 


special case of the objective function discussed by Dette and Wong (1996) (<l>i in 
their notation). Unlike Bayesian H-optimality, neither of the above alternative 
criteria has the interpretation of approximate equivalence to the maximization of 
Shannon information gain. As an example of the use of the mean local efficiency 
criterion, consider again the exponential decay model from Section |2.1[ From 
Corollarywhen P = 17(0, a), a > 0, all designs are Bayesian (P-)singular with 
respect to P for the 0-parameterization. By contrast, it is shown easily that the 
design with a single support point x = a/2 is T-optimal with a mean efficiency of 
approximately 67%. This design is locally P-optimal when 9 is equal to its prior 
mean, but highly inefficient when 9 is small. Thus, T-optimal designs are much 
less strongly driven by their worst-case behaviour. As with EW P-optimality, if 
the problem is in fact with an ill-conditioned Q rather than a singular P, then it 
is possible that Ik-optimal designs may be less robust than Bayesian P-optimal 
or Bayesian P-efficient designs. 

A further alternative approach to design selection under parameter uncer¬ 
tainty is to consider maximin designs. In the case of greatest interest in this 
paper, 0 is such that inf^g© |M(^;0)| = 0 for all ^ G S, thus design selection 
clearly fails using the unstandardized maximin P-criterion ( Imhof| (2001)). Often 
design selection will also fail when using the standardized maximin P-criterion. 
It is clear that the Bayesian approach, under suitable prior distributions, benefits 
from greater robustness to the presence of singular 9 than the use of maximin 
criteria. For related results see Braess and Dette (2007), where conditions are 
established under which the number of support points in a standardized max¬ 
imin or Bayesian P-optimal approximate design grows arbitrarily large as 0 is 
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expanded. In contrast, in the work presented here it is supposed that 0 is fixed 
and the focus is instead on examining the adequacy of the set of competing exact 
designs under different prior distributions. Here, results have also been developed 
for additional multiparameter models and numerical methods proposed. 


Supplementary material 

The online supplementary material for this paper contains proofs of the 
analytical results described in the text. 
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SI. Proofs of analytical results 

Proof of Proposition^ Assume that at least one Xi > 0. For the 6 pa¬ 
rameterization, we demonstrate two implications: (i) if E-p{l/9) < oo 
and E'pifogO) < oo, then > —oo; and (ii) if ^^(log^) = oo or 

Ep{l/9) = oo, then = -oo. Here, = E{log\Me{^;9)\}, 

where log \Mo{^;9)\ is given by ( 2^ . 

For (i), observe that —oo < E-p {(2/0) maxj=i^...^„{xj}-|-41og6*} < oo. 
Considering the left hand side of (2.1) and the reparameterization (2.2), 


n . 

—oo < log^^ — E-p < (2/0) max {xi} 

i=l 


+ 41og0 <0(e;P), 


as required. For (ii), note that in addition to (2.1), the following weaker 
inequality holds: 


0(^; V) < log ^ a:- - 4log 0 . 


2 = 1 


Taking expectations of both sides, if Ep(log0) = oo then V) = —oo. 
For the other case, let 


6(0) = ^ -I 2 min {xi : x* > 0} -f 40 log 0 


0 


2=l,...,r2 


1 . 




Since 01og0 —)■ 0 as 0 —)■ 0, there is some 6 > 0 such that, for 9 < 6, 


6(0) > (1/0) min (x* : Xj > 0} 

2=1,...,n 
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Hence, with I denoting an indicator fnnction, 


E-p{b{9)} > Ev{b{9)I{9 < 6) + inf b{9) I{9 > <5)} 

0^5 

> min {xi : Xi > Q}E'p{{l/9)I{9 < 5)} + (41og(5) Pr( 6 ' > 5) 

(Sl.l) 


If E'p{l/9) = cx), then ii^-p{(l/ 6 *)I( 6 ' < 5)} = cx), and so by (Sl.l), we have 


that E-p{b{9)} = oo, regardless of whether E'p{log9) = —oo. Recall from 


( 2 . 1 ) that 


4‘{i\r) < log - Evim] 


2 = 1 


Hence if E’p{l/9) = oo, we have = —oo. This is snfficient to 

establish the proposition. □ 


Proof of Lemma\^ Observe that M{xi,9) = where is 

dehned in the statement of the lemma. Moreover, for i = 1,... ,n, either 
(i) a:* = 0 or (ii) x, > Xmin. In (ii), we have 


- 26 »lXn 




^ M{xi, 9) < . (SI.2) 

Moreover, the above holds also in (i) since then M{xi, 9) and are 

matrices of zeroes. Snmming (S1.2) over z = 1,..., n, we obtain: 

— 20ia::max ]\/f^ „ ^ ]\/f {C’ 




’M; 


( 5,03 • 


(S1.3) 


Taking log-determinants thronghont (S1.3) yields the resnlt, when com¬ 
bined with the fact that = 6 * 3 |M 5 ^i|. □ 


Dehne g^{b) = The following is needed to establish Lemma 

Lemma 5. Suppose that ^ contains at least three distinct Xj > 0. Then the 
derivatives of g^{6) satisfy: (i) g^^'^\o) = 0, k = 1,... ,7, (ii) gf'\o) > 0. 

Proof of Lemma Part (i) can be verihed nsing symbolic compntation, e.g. 
Mathematica. It can also be shown that 


gf{0) = 280(^2^4^6 - ^ 2 ^ 5 ^ - sis, + R3^4^5 + ^3^4R5 - 5'|} , 
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where Si = Define the following, 



fS2 

S 3 

Si] 



Xi 


K = 

S 3 

Sa 

S 5 


Xi 

Xi 

Xi 



55 

sy 

i:Xi>0 


Xi 

x\) 

= min{ 

Xi : 

Xi > 

0}. 

N'ote that K P x^^K' 

We 

have 



(8) 

gy 

(0) = 

280\K\ > 280a:|^i„ 

K'\ 




Observe also that K' is the information matrix of the design = [xi : Xi > 
0) under the linear model with regressors By the assumption that 

there are at least three distinct Xi > 0, the above linear model is estimable 
and so \K'\ >0. This establishes part (ii). □ 

Proof of Lemma @ If? has fewer than three distinct Xi > 0, then we have 
rank(Myi) < 2 and Ep(\og\Ms,i\) = —oo for any prior V. Thus we may 
assume that ^ has at least three distinct Xi > 0. From Lemma it is clear 
that g^{^) ~ {k/2)6^ for small S, where a > 0. We show that the approxima¬ 
tion is sufficiently close that E-p(\og |Myi|) = —oo if log 6 dV{6) = —oo. 
By Taylor’s theorem, there is an ei > 0 and A > 0 such that, for <5 G (0, Ci), 

- (r/2)^8| < 

Hence, for 5 G (0,ei), 

|2^5(5)/(^«R)-1|<(2A/R)h. 


As the logarithm function has derivative 1 at argument 1 , there exists 0 < 
62 < 61 such that for 5 G ( 0 , 62), 


6 ^k 

Thus, for 6 G ( 0 , 62), 


log Mil- log 1 


< 2|2^^((5)/(5®r) -1| < (4A/r)5. 


so that 


'< 5 <e 2 


log^^(5) - log(R^V2)| < (4 A/r)5 , 


logg^{6)dV{9) — / {8 log(5log(R/2)}ci'P(6') 


' S<e2 


< i‘2X/K)el. 
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Hence it is clear that = —oo if and only if log 6 dV{6) 

—oo. Fnrther, g^{S) is bounded above, and 

[ ^ogg^{S)dV{9) = f \ogg^{6)dV{9)+ f log g^{6)dV{9). 

J J 5<.e2 ^ 5>e2 

Thus, ^ \ogg^{5)dV{9) = —cx) when f^^^^logS dV(9) = —oo. The result 
is dually established by observing that f^^^^logS dP(9) = —oo if we have 
f^<i log 6 dr(9) = -oo. 

□ 


Proof of Proposition From Lemma [TJ 

log|M(^; 6 ')| < -e^iXmin + dlog^s + log |M 5 ,i 


(S1.4) 


It can be shown that \M5,i\ < 2S'oS'| + 4^2^^, thus J log \Ms^i \ dV{9) < oo. 
As 6*1 > 0, f —69iXinmdP(9) < 0 < oo. If log 93 dP(9) < oo, as 


assumed by the lemma, then all terms on the right hand side of (S1.4) have 
integral < oo and 


y logjM(e,9)jdP(9)< j -Q93x^^dV{9)+4 j log93dV{9) 

+ J log\Ms,i\dV{9). 

Hence if, in addition to log 6*3 dV{9) < 00 , we have that at least one of 
/log|M 5 ,i|ci'P( 6 ') = - 00 , J -Q9iXmindV{9) = - 00 , or log 6*3 6 /^( 6 *) = 
—00 holds, then also J log 9) \ dV{9) = — 00 . Using Lemma]^ the con¬ 
dition J log iMyil dV{9) = —00 in the preceding statement may be replaced 
by log 6 dP(9) = — 00 . This establishes the result. □ 


Proof of Theorem It follows from Lemma that 

log \M{^; /9)| > log -|-p min log tCj. 

i 

From (2.6), tc(p) > (l/4)e“I^L Thus, 

log|M(C;/d)| > log|F^F| +plog [(1/4)6“I''*'] 
> log — pmax |pj| — plog4 . 
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Moreover, by the triangle inequality, maxj |?7j| < ^^maxj \ fj{xi)\\(3j\, and 
hence 


log|M(^;/3)| > log|F^F| -plog4-p Vinaxl/j 

f ^ r 


X, 


(S1.5) 


The right hand side of (SI.5) has expectation greater than —oo due to the 
assumptions that E-p{\(3j\) < oo and > 0. Therefore we have that 

Er{\og\M{^;m>-oo. □ 

Proof of Proposition From Lemma 

log |M(^; /3)| < log + pmaxlogtPj. 

i 

It can also be shown that w{r]) is a decreasing function of |? 7 | and, from 
(2.6), that w{\ri\) < exp(—I t^I). Hence, 

log (3)\ < log + plogtp(min \rii\) 

i 

< log — pmin |? 7 j| . 

i 

It remains to prove Epimirii \rii\) = oo to establish that Ep{log |M(^; /d)|} = 
—oo. This is achieved by conditioning on an event where the parameter /3j 
dominates. Given j G {0,... ,p — 1}, let T G S be an event such that (a) 
(3j > 1 , and (b) \fk{xi)\\(3k\ < ^ for all z, where e > 0 is such that 

- IfjMW > 2 e for any i,i' with \fj{xi)\ \fj{xi')\. 

We can guarantee (a) and (b), for example by taking 


£ = {/3 : /3j > 1, \/3k\ < S , for A; j} g S , (SI. 6 ) 

with S = e/[(p — 1) maxj^; \ fi{^i)\]- The above satishes 

Pr(T) = Pr(/3j > 1) Pr(|/3fc| < S for all fc 7 ^ j | > 1) > 0, 

by assumptions (i) and (ii) of the proposition. 

By the reverse triangle inequality and from (b), on event £^ 

11^*1 - \ fj{xi)\l3j\ < ^ \fk{xi)\\l3k\ <e. 


(S1.7) 
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Since on £ the term from f3j dominates, the minimum of |? 7 j| is found 
by minimizing the f3j term. To see this formally, observe that if > 

\fj{xii)\l3j, then by the definition of e. 


\fj{xi)\l3j - \fj{xii)\l3j > 2ef3j > 2e. 


and, by also using (S1.7), 


\Vi'\ < \fjM\Pj + e < \fj{xi)\Pj -e<\r]i 


Thus, on £, if \fj{xi)\(3j > \fj{xi/)\(3j then \rii\ > This can be used to 
show that on £, if i* G argmin^ \rii\ then i* G argmin^ \fj{xi)\, as follows. 
Suppose that i* ^ argmin, \fj{xi)\, then there would exist some i such that 
\fj{xi*)\(3j > \fj{xi)\/3j. By the above, we would have that \rii*\ > \rii\, 
which contradicts the definition of i* as a member of argmin^ \T]i\. Hence, 


min |?7i| = \r]i*\ , i* G argmin^ \fj{xi)\ 

i 


Consequently, 

^^.(min |? 7 i| | £) > \fj{xi*)\ET:>{f3j \£) -e 

i 

= oo by assumptions (iii) and (iv) of the proposition. 
For the marginal expectation, note that Pr(£^) > 0, and hence 
^^^(min |? 7 i|) > Pr(£^)i7p(min \rii\ | = cx). 

i i 

□ 


Proof of Proposition^ Case (i): assume that fj{xi) > 0. On the event 
£ defined in the previous proof, we have that rji > fj{xi)/3j — e. Hence 
E{rii I = oo. Let P = Pr(i/j = 11 /3), noting that 1 —Pr(i/j = 11 /9) < 
Then, 

E^{1 — P\£) = exp Elogjl — P\£} < ex.pE{—T]i | = 0 , 

where E^ denotes the geometric mean. Hence the conditional geometric 
mean of 1 — P is zero. 

Case (ii): assume fj{xi) < 0. On £, rji < fj{xi)(3j + e and so E^rji \ £) = 
—oo. However, P < and so E^{P \ £) < exp E{rii | = 0. □ 
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Proof of Theorem Assume ^ is such that \F'^F\ > 0. Note that w{ri) is 
decreasing in Ir^l and so, by Lemma 

log /3)\ > log {F'^FI + p\ogw{max \rii\). 

i 

We split i^log |M(^; (5)\ into two components, 


Elog|M(^;/3)| = E[log\M{^;(3)\I{max\rii\ < k)] 

+ E [\og\M{^; P)\I{max\7]i\ > r) ] , (SI.8 ) 


where r > 0, and then show that both components are > — cxd. 

Note that for |? 7 | < r, w{ri) is bounded below by a constant, A > 0. 
Thus, if maxj \rii\ < k, then log |M(^; /3)| > log +plogA and so 


log I(max \r]i\ < k) 


> —oo. 


(S1.9) 


For IpI > K, with k sufficiently large, by the asymptotic approximation 


w{ri) > L\ri\e~'^^ 
for some L > 0. Hence if maXj \rii\ > k, then 


log |M(^;/3)| > log + plogL + plogmax \r]i\ — pmaxri‘^/2 

i i 

> log + plogL + plog K, — pmoxipl/2 . 

i 


However, it is straightforward to show that if E(3l < oo and E\Pk/3i\ < oc 
for k,l = 0,... ,p — 1, then E maXj pf < oo. This is sufficient to prove that 


E 


log |M(^;/3)| I(max |? 7 i| > k) 

i 


> —OO . 


(SI.10) 


Combining (SI.8), ( S1.9| ) and (SI.10), we find that overall log |M(,^;, 
—oo, and so V is non-singular. 


> 


□ 


Lemma 6. Let X be a random variable taking values m A C M, with A 
unbounded above, and let s,t : A ^ R be measurable extended real-valued 
functions that satisfy (i) for all k eR, sup^^jg^||s(x)| < cx) (ii) t{x) is 
increasing, and (Hi) r{x) = t{x)/s{x) —)■ 0 as x —>■ oo. Given the above, if 
F'[s(X)] = oo then F'[s(X) — t{X)] = oo. 
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Proof. Note that from (iii) there exists /c' > 0 such that when X > k', we 
have s(X) —t(X) > (l/2)s(X). When X < k', by (ii) we have t{X) < t(k'). 
Hence, 

E[s{X) - t{X)] = ^{[s(X) - t{X)] I(X < k') + [s(X) - t{X)] I{X > k')} 
> E{[s{X) - t{E)] I{X < k') + (l/2)s(X) I{X > E)} . 

By condition (i), s{x) is bounded on {x < E}. Therefore the first term 
inside the expectation above is also bounded, and since = cx) we 

must have that > E)} = oo. Hence the right hand side of the 

above inequality has inhnite expectation, and so E[s{X) — t{X)] = oo. 

□ 


Proof of Proposition Similar to the proof of Theorem we split the in¬ 
tegral T^log /^)| into two components. 


E\og\M{E,f3)\=E log|M(e;/?)|I(min|r 7 ,| <«:) 


log |M(^;/3)| I(min Ir^il > k) 


(Sl.ll) 


where k > 0. Note that w{t]) is symmetric and decreasing in \t]\. Thus, by 
Lemma[^ for min* \rii\ < k we have log |M(^; /3)\ < log lE'^E] -l-logtc(O), so 

log /3)| I[(niin |? 7 j| < k) < Pr(min |? 7 j| < K)[log |F’^F|-|-log w(0)] < cxd , 


i.e. the first term in (Sl.ll) is always < oo. 


We now consider the second term in (Sl.ll ). For min* \rii\ > k, provided 
n is sufficiently large then by (2.8), 

maxw{rii) = t(;(min |? 7 i|) < L min |? 7 j| , 

i i i 

for some L > 0. By Lemmafor minj \7]i\ > k, 

log \M{E, /3)| < log -|-plogL -|-plogmin |? 7 j| — (p/2) minp^^ . 

i i 

Assume that > 0. Let Xi = minj|? 7 j|, Ai = [0, oo), si(Xi) = 

{p/2)Xfl{Xi > k), and ti(Xi) = plogXiI(Xi > k). We have that 


log|M(^;/?)|I(min|pi| > k) 

I 

< E [fi(Xi) - si(Xi) + (log \E^E\ +plogL)I(Xi > k)] 


(S1.12) 
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We may assume that r > 1, in which case ti is increasing and so Xi, Ai, 
si, ti satisfy the conditions of Lemma Hence, if -E'[si(Xi)] = oo then 
E[ti{Xi) — si(Xi)] = —oo, in which case, from ( S1.12[ ), 

i?[log |M(^;/3)| I(min \rji\ > r)] = -oo , 


and so by (SI.11) we have that E\og |M(^;/3)| = —oo. Hence to prove the 
proposition it is sufficient to show that £'[si(Xi)] = oo; we demonstrate 
that this holds in the next paragraph. 

Recall that on the event S, dehned in (SI. 6 ), we have that minj |pj| > 
miuj \fj{xi)\\(3j\ — e. Thus, on £, 


mmr]i > - 2 emin 1 /^( 0 ;*)||/5j| + 

i i i 

> S2{X2) — t2{X2) + , 


(S1.13) 


where above X 2 = mmi\fj{xi)\\(3j\, A = [0, 00), with S 2 (X 2 ) = X| and 
^ 2 (X 2 ) = 2 eX 2 . From assumptions (ii) and (iii) of the proposition we have 
that E[\f3j\‘^ I = 00 and miuj \ fj{xi)\ > 0, thus 

F;[s2(X2) |£^] = E[mm\fj{xi)\‘^\/ 3 j\‘^\ 8 ] =00. 

i 

Hence, applying Lemma we see that F'[s 2 (X 2 ) — t 2 (-^ 2 ) | = C )0 and so, 

), E\mm.i rjf \ £] = 00 . Ao complete the proof we must consider the 
marginal expectation of si(Xi) = (p/2)X^I(Xi > r), where Xi = min* \rii\. 
Note that by assumption (i), Pr(£^) > 0, thus 

EXl = Eminr]^ > FT{S)E{mmri^ | = 00 . 

i i 

Finally, observe that Xf = Xf I(Xi < r) + X^I(Xi > k) and 

0 < E{X^I{Xi <k)}<k\ 

Since EXf = 00, we therefore have that F^{X^I(Xi > k)} = 00. Hence 
F'[si(Xi)] = 00. As shown in the previous paragraph, this is enough to 
establish that i^log |M(^;/3)| = —00, and the proposition is proved. 

□ 


by (SI.13 
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Proof of Theorem\^ From Lemmaand the fact that w{r]) = exp{r]), 
log /3)| > pminlogWj + log |F'^F| 

i 

> pminr^i + log . 

i 

However, min* rji > — max* | rji \ and so 

log \M{^; /S)| > —pmax \r]i\ + log • 

i 

We know from the proof of Theorem that F'maxj \r]i\ < oo nnder the 
conditions given, and so we also have that E\og\M{^-, (3)\ > —oo for the 
Poisson model. □ 


Proof of Proposition^ First note from Lemmathat 
log \M{^; j3)\ < pmaxr/j + log • 

i 

Thus, to establish that log (3)\ = —oo, it is sufficient to show that 
E maXiPi = —oo. Similar to the proof of Proposition]^ the strategy is to 
hnd an event where rp is well approximated by fj{xi)/3j. Let £2 be an event 
such that /3j < —1 and \fk{xi)\\/3k\ < e for all i, where e > 0 satishes 

ll/i(a^i)l - l/j(^*')ll > 2 e for any i,i' with \fj{xi)\ 7 ^ \fj{xi :)\. 

For example, one possible definition is 


£2 = {13 ■. /3j < -1 , \/3k\ < S for all k ^ j} , 

with 6 = e/[(p— 1) maxjfc \fk{xi)\]. On £ 2 , by arguments similar to those in 
the proof of Proposition 


maxr^j < max{fj{xi)(3j} + e 

i i 

< /3j min fj{xi) + e, 

i 

where the second line follows since, by assumptions (i^ 
maxi{fj{xi)/3j} = /3j min* fj{xi). By condition (iii), E[l3j 


(S1.14) 

and (v), we have 
£^ 2 ] = —00 and so. 


from (SI.14) and condition (v), we have F^[maxj r/j | T 2 ] = — 00 . Moreover, 
by condition (ii), Pr(T 2 ) > 0 and so 


F'[max 77 j 1(^2)] = Pr(£^2) E[maxrii 1^2] = —00 . 


(S1.15) 
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Note that 


maxr^j = max77jl(£^2) + maxT^j . 


(S1.16) 


By assumptions (i) and (v), fj{xi)/3j is negative and so we have max^pj < 
'■i\fk{xi)\\l3k\ ■ Thus by assumption (iv), i?{maxjp*< oo. 


max 


E 

Hence, by (SI.16), 

Emaxrji = Pr(T2) ii^[max?7j 1^2]+ Pr(£^^) ii^[max?7j | , 

i i i 

and, by (SI.15), we have Emaxirji = oo and hence E\og\M{^] ( 3 )\ = —oo. 

□ 


Proof of Proposition Let Ce denote a one-run exact design with design 
point Xe = —1/ loge, with —>■ 0 as e —)■ 0. We show that, compared to (e, 
the relative Bayesian H-efficiency under of a hxed (exact) design ^ tends 
to zero as e —)■ 0. It is not claimed that is Bayesian H-optimah However, 
the relative Bayesian H-efhciency of ^ is an upper bound for the absolute 
Bayesian H-efhciency of and so this argument is sufficient to establish 
that Bayes-eff(.^; P^) —)■ 0 as e —)■ 0. 

First note that under 


T?(R\ ^ na loga-loge 

E[p) = / -- dO = -)■ oo as e —)■ 0 . 

L 9 a — e a — e 


Observe that, for the /3-parameterization, using ( 2 . 1 ), 


(S1.17) 


Ve) - 0(C; Pe) < log - 2 min {xi}E(3 - 2 logx, -F 2E(3x^ 

i:Xi>0 

< log min Xi- xA Eft - 2 logx^. 

i:Xi>0 


Using (SI.17) and the dehnition of x^, for e sufficiently small, 

< logs',,,, -h 2 iFloge - 2 log J , 

for some iP > 0. Hence, provided e is sufficiently small, the relative Bayesian 
H-efficiency satishes 


exp{0(^;T’,) -(t){Ce]Pe)} < S'„„(e^loge)^ -)■ 0 as e -)■ 0, 
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which is sufficient to prove the claim. The limit above can be found using 


L’Hospital’s rule. Using (2.2), the same result for the Bayesian D-efficiency 


also holds under the 0-parameterization. 


□ 



